Ego-centered networks and the ripple effect 
— or — 
Why all your friends are weird 
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Recent work has demonstrated that many social networks, and indeed many networks of other types also, 
have broad distributions of vertex degree. Here we show that this has a substantial impact on the shape of ego- 
centered networks, i.e., sets of network vertices that are within a given distance of a specified central vertex, 
the ego. This in turn affects concepts and methods based on ego-centered networks, such as snowball sampling 
and the "ripple effect." In particular, we argue that one's acquaintances, one's immediate neighbors in the 
acquaintance network, are far from being a random sample of the population, and that this biases the numbers of 
neighbors two and more steps away. We demonstrate this concept using data drawn from academic collaboration 
networks, for which, as we show, current simple theories for the typical size of ego-centered networks give 
numbers that differ greatly from those measured in reality. We present an improved theoretical model which 
gives significantly better results. 



I. INTRODUCTION 

In social network parlance, an ego-centered network 

(sometimes also called a personal network) is a network cen- 
tered on a specific, individual (generically "actor"), whom we 
call the egoliraucl For example, Sigmund Freud and all his 
friends would form an ego-centered network. This network 
would have radius 1, meaning we include everyone within 
distance 1 on the friendship network of the central individual, 
Freud in this case. If we also included friends of friends in the 
network, it would have radius 2. In Fig. [j]we show a radius-2 
ego-centered network of scientific collaborations. The ego in 
this case is my own: the central vertex in the figure represents 
me, the first ring of vertices around that my coauthors on pa- 
pers published within the last ten years, and the second ring 
their coauthors. As the figure shows, networks of this type can 
grow very rapidly with radius. 

Ego-centered networks are of interest for a number oi-X^a- 
sons. For instance, in two recent papers, Bernard et al.tjB ad- 
dress the following question. Consider some subset of the 
population, consisting of e people. They could be people in a 
particular demographic or social group, or the people involved 
in a particular event. How many of these e people, if any, is 
the typical person likely to know? As Bernard et al. show, 
this is easy to calculate. If the total population who might be 
involved in the event is f, then each member of that popula- 
tion has a probability p = e/t of being involved. If the av- 
erage person knows c other people, then the average number 
of those people who were involved is simply m = cp = ce/t, 
Bernard et al. take the example of the population of the United 
States, for which they estimate from previous empirical stud- 
ies thatJ.he average person has a social circle of about c = 290 
peopleJJQ and for which the total population currently stands 
at around 280 million. Thus the ratio t/c ~ 1 000000 in this 
case, giving the simple rule of thumb 

(1) 



in a million out of the country's total population. 

As an example, let us apply the method to the problem of 
estimating how many HIV positive individuals the average 
person in the US knows. At the time of writing, there were 
about 800000 known cases of HIV in the US (including those 
who have died). The number of actual cases is probably sub- 
stantially greater than this and is estimated to be somewhere 
between 1.0 and 1.5 million. To take a conservative figure, let 
us suppose that the actual total is e = 1 million. From Eq. ([!]), 
we then estimate that on average each member of the US pop- 
ulation as a whole has or had one acquaintance who is or was 




1000000 



Simply stated, this equation says that the average individual 
living in the United States is acquainted with about one person 



FIG. 1: An ego-centered network of scientific collaborations, cen- 
tered in this case on the author of this paper, represented by the ver- 
tex in the middle of the figure. The two surrounding rings represent 
his collaborators, and their collaborators. Collaborative ties between 
members of the same ring, of which there are many, have been omit- 
ted from the figure for clarity. 
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HIV positive. It must be emphasized that this is an average 
figure. HIV positive individuals are not a uniform sample of 
the population. Nonetheless, Eq. ([!]) is expected to give a cor- 
rect population average of m. 

Now we want to extend this calculation one step further. If a 
person has no immediate friends in the group under consider- 
ation, how many of their friends' friends are in this group? Al- 
ternatively, one could rephrase the question and enquire how 
many people in the population as a whole have one or more 
friends of friends in the specified group: one can visualize a 
group or event as the center of a set of ever widening circles 
of influence in the social network. Colloquially, this is what 
we call the "ripple effect." The two questions here are equiva- 
lent, but not identical. In this paper we speak in the language 
of the former, which focuses our attention on the calculation 
of the number of actors two steps away from the ego in an 
ego-centered network. 

Unfortunately, the calculation of this number is not simple. 
An approximate solution is given in Ref. || but, as we will 
demonstrate, this solution misses some important features of 
real social networks and as a result can give answers that are 
inaccurate. The crucial point is that in many networks there 
exist a small number of actors with an anomalously large num- 
ber of ties. While it may appear safe to ignore these actors 
because they form only a small fraction of the population, we 
show that in fact this is not so. Because of the way the rip- 
ple effect works, this small minority has a disproportionately 
large influence, and ignoring them can produce inaccurate es- 
timates for the figures of interest. We show here how to per- 
form calculations that take these issues into account correctly. 

The topic of this paper is also of interest in some other ar- 
eas of social network theory. One such area is "snowball sam- 
pling," an empirical technique for sampling social networks 
that attempts to reconstruct the ego-centered network around a 
given central actor.Qcl In this technique, the central actor is first 
polled to determine the identities of other actors with whom 
he or she has ties. Then those actors are polled to determine 
their ties, and so forth, through a succession of generations of 
the procedure. The statistical properties of snowball samples 
have been studied using Markov chain theoryEl and the tech- 
nique has been shown to give good (or at least predictable) 
samples of populations in the limit where a large number of 
generations of actors is polled. Unfortunately, in most practi- 
cal studies only a small number of generations is polled, and 
in this case, as we will see, the sample may be biased in a se- 
vere fashion: snowball samples, like calculations of the ripple 
effect, are highly sensitive to the presence in the population of 
a small number of actors with an unusually large number of 
ties. 

The outline of this paper is as follows. In Section [lj| we 
calculate exactly the expected number of network neighbors at 
distance two from a central individual, in a network without 
transitive triples. In Section |ni| we show how the resulting 
expression is modified when the network has transitivity, and 
in Section |lV| we apply our theory to two example networks, 
showing that in practice it appears to work extremely well. In 
Section |v| we give our conclusions. 



II. FRIENDS OF FRIENDS 

So how do you estimate the number of people who are two 
steps away from you in a social network (or indeed in a net- 
work of any kind)? Bernard et aln suggest the following sim- 
ple method. If each actor in a network has ties to c others on 
average, and each of those has ties to c others, then the average 
number of actors two steps away is c 2 . There are some prob- 
lems with this however. First, as pointed out in Ref. [| people 
who know one another tend to have strongly overlapping cir- 
cles of acquaintance, so that not all of the c people your friend 
knows are new to you — many of them are probably friends of 
yours. In othetcircles this effect is called network transitivity^ 
or clustexLng,El and it is also related to the concept of network 
density I 1 ! 1 'I Typically, the mean number of people two steps 
away from an actor can be reduced by a factor of two or so by 
transitivity effects. Bernard et al. allow for this by including a 
"lead-in factor" A, in their calculation. We discuss transitivity 
in more detail in Section |ni| 

Even if we ignore the effects of transitivity, however, there 
is a substantial problem with the simple estimate of the num- 
ber of one's second neighbors in a social network. By approx- 
imating this number as c 2 we are assuming that the people 
we know are by and large average members of the population, 
who themselves know average numbers of other people. But 
we would be quite wrong to make such an assumption. The 
people we know are anything but average. 

Consider two (fictitious) individuals. Individual A is a her- 
mit with a lousy attitude and bad breath to the point where it 
interferes with satellite broadcasts. He has only 10 acquain- 
tances. Individual B is erudite, witty, charming, and a profes- 
sional politician. She has 1000 acquaintances. Is the average 
person equally likely to know A and B? Absolutely not. The 
average person is 100 times more likely Jx) know B than A, 
since B knows 100 times as many people.Ej Extending this ar- 
gument to one's whole circle of friends, it is clear that the peo- 
ple one knows will, overall, tend to be-people with more than 
the average number of acquaintances.^] This means that the 
total number of their friends — the people two steps away — 
will be larger than our simple estimate would suggest. And as 
we will show, it may be very much larger. 

The fundamental concept that we need to capture here is 
that not all people have the same number of acquaintances. In 
the language of social network analysis, there is a distribution 
of the degrees of vertices in the social network. (Recall that 
the degree of a vertex is the number of other vertices to which 
it is directly connected.) Let k denote the degree of a vertex 
and pit the degree distribution, i.e., the probability that a ver- 
tex chosen uniformly at random from the network will have 
degree k. Thus, for example, the mean degree c of a vertex is 



k=Q 



(2) 



Degree distributions have been measured for a variety of 
networks .— .an d . io -pany cases are found to show great 
variation.ll30 14 l 15 lllj It is certainly not the case that vertices 
always have degree close to the mean (although they may in 
some networksU). A clear example of this can be seen in 
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Fig. [j], in which some vertices have degree only 1, while at 
least one has degree greater than 100. 

Now the fundamental point of this paper may be expressed 
as follows. The distribution of the degrees of the vertices to 
which a random vertex is connected is not given by p^- The 
probability that you know a particular person is proportional 
to the number of people they know, and hence the distribution 
of their degree is proportional to kp^ and not just Pkt3 The 
correctly normalized distribution is thus 



qk 



kpt 



(3) 



Now consider the number of vertices two steps away from a 
given vertex. The probability P^l^i) that this number is k 2 , 
given that the number of vertices one step away is k\, is 

P(k 2 \h) = £ £ ... £ S(X> ( -1),^ f\q mp 



degrees of neighbors degrees sum to k 2 



(4) 



where b(m,n) is 1 if m = n and otherwise. Note the oc- 
currence of m, — 1 in this expression; the amount that your ith 
neighbor contributes to the total number of your second neigh- 
bors is one less than his or her degree, because one of his or 
her neighbors is you. The overall probability that the number 
of second neighbors is k 2 can then be calculated by averaging 
Eq. (||) over k\ : 



P{h)= £ Pkl P{k 2 \h). 

k { =0 



(5) 



We want the mean value of k 2 , which we will denote c 2 , and 
this is given by 



c 2 = k 2 = £ k 2 P(k 2 ). 



(6) 



k 2 =0 



Combining Eqs. ([}])-(|]), we thus arrive at the quantity we are 
interested in: 

oo oo oo oo 

c 2 = £ k 2 £ Pkl P(k 2 \h) = £ Pkl £ k 2 p{k 2 \h) 

k 2 =0 k { =Q /q=0 k 2 =0 

oo oo oo oo / k { \ k { 

Ewi E ••• E E fe2§ E( mi_1 )' fe2 n^.; 



/q =0 m\=\ m^^=\k 2 —Q 



.7=1 



*! oo k[ 

E^iE E ■•• E ( m i- 1 )II^ m ; 

k\=Q i=lm{=\ m^=l 7=1 

"l^- 1 oo 



= E kl Pkl 



k,=Q 



E i« 



E(*-i)«* 

k=0 



Y^k{k-l) Pk = k 2 -k. 

k=0 



This result 



C2 



(7) 



(8) 



is the correction we were looking for to the-simple estimate 
of the number of vertices two steps away£3 The number of 
vertices two steps away is given by the mean square degree 
minus the mean degree. The important point to notice is that 
this expression depends on the average of the square of a ver- 
tex's degree, rather than the square of the average, as the sim- 
ple estimate assumes. If the degrees of vertices are narrowly 
distributed about their mean, then these two quantities will be 
approximately equal and the simple estimate will give roughly 
the right result. As mentioned above, however, many net- 
works have broad degree distributions, and in this case the 
average of the square and the square of the average will take 
very different values. In general, we can write 



c 2 = k 2 -k+{k 2 -k 2 ) = c 2 -c + a 2 , 



(9) 



where a 2 is the variance of the degree distribution and c is, as 
before, its mean. Normally, a 2 3> c and so the difference be- 
tween the simple estimate c 2 and the true value of c 2 is about 
equal to the variance. In Section |[y| we give some examples 
of real networks for which the variance is large — much larger 
than c 2 itself — and hence for which the simple estimate gives 
poor results. 



III. TRANSITIVITY AND MUTUALITY 

The calculation of the previous section is incomplete for a 
number of reasons. Chief among these is that it misses the 
effect of network transitivity or clustering. In most social net- 
works, adjacent actors have strongly overlapping sets of ac- 
quaintances. To put this another way, there is a strong proba- 
bility that a friend of your friend is also your friend. Transi- 
tivity can be measured by the quantity 



C 



6x number of triangles in the network 
number of paths of length two 



(10) 



Here paths of length two are considered directed and start at 
a specified vertex. A "triangle" is any set of three vertices 
all of which are connected to each of the others. The factor 
of six in the numerator accounts for the fact that each triangle 
contributes six paths of length two to the network, two starting 
at each of its vertices. This definition is illustrated in Fig. || 
Simply put, C is the probability that a friend of one of your 
friends will also be your friend. 

The quantity C has been widely studied in the theoretical 
literature, and its value has bee»,measured for many different 
networks. Watts and Strogatzt3 have dubbed it the cluster- 
ing coefficient. It is sometimes also known as the "fraction of 
transitive triples" in the network. Eq. ( |lC)| ) is not in the form of 
the standard definition, and so may not be immediately recog- 
nizable as the same quantity-discussed elsewhere. The most 
commonly used definition isEfl 



C 



3 x number of triangles in the network 
number of connected triples of vertices ' 



(11) 



where a "connected triple" means a vertex that is connected to 
an (unordered) pair of other vertices. It takes only a moment to 
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FIG. 2: An illustration of the calculation of the clustering coefficient 

for a small network. Vertex A has three paths of length 2 leading vW J} p 

from it, as marked. Similarly vertices B, C, and D have 3, 2, and 2 
such paths, for a total of 3 + 3 + 2 + 2 = 10. There is one triangle in 
the network. Hence, from Eq. (pp[), the clustering coefficient is 6 x 
= i . Alternatively, one can count the number of connected triples 
of vertices, of which there are five, one each centered on vertices A 
and B, three on vertex C, and none on vertex D. Using Eq. (|ll]), the 
clustering coefficient is then 3 x ^ = | again. 




convince oneself that the two definitions are equivalent — see 
Fig. U again. (Note that paths are ordered in (|l0|) and triples 
are unordered in ( 1 1 ), which accounts for an apparent differ- 
ence of a factor of two between the two definitions.) 

What effect does clustering have on our calculation of the 
number q of second-nearest neighbors in the network? Con- 
sider a vertex with degree m lying in the first "ring" of our 
ego-centered network, i.e., one of the immediate neighbors of 
the central vertex. Previously we considered all but one of 
this vertex's m neighbors to be second neighbors of the cen- 
tral vertex. (The remaining one is the central vertex itself.) 
This is why the term m — 1 appears in Eq. (Q). Now, however, 
we realize that in fact an average fraction C of those m — 1 
neighbors are themselves neighbors of the central vertex and 
hence should not be counted as second neighbors. Thus m — 1 
in Eq. (Jfy should be replaced with (1 — C) x (m — 1 ). 

Making this substitution in Eq. (Q) we immediately see that 



c 2 = {l-C){k 2 -k). 



(12) 



This result is in general only approximate, because the prob- 
ability of a vertex having a tie to another in the first ring is 
presumably not independent of the degrees m\ of the other ver- 
tices. As we show in the following section however, Eq. ( p"2| ) 
gives considerably better estimates of ci than our first attempt, 
Eq. (|>. 

But this is not all. There is another effect we need to take 
into account if we are to estimate C2 correctly. It is also pos- 
sible that we are over-counting the number of second neigh- 
bors of the central individual in the network because some of 
them are friends of more than one friend. In other words, you 
may know two people who have another friend in common, 
whom you personally don't know. Such relationships create 
"squares" in the network, rather than the triangles of the sim- 
ple transitivity. To quantify the density of these squares, we 
define another quantityEj which we call the mutuality M: 



M : 



mean number of vertices two steps away 
mean paths of length two to those vertices 



(13) 



In words, M measures the mean number of paths of length 
two leading to your second neighbor. Because of the squares 



FIG. 3: (a) An example of an actor (F) who is two steps away from 
the ego (E, shaded), but is friends with two of E's friends (A and B). 
F should only be counted once as a friend of a friend of E, not twice, 
(b) A similar situation in which A and B are also friends of one an- 
other, (c) The probability of situation (b) can be calculated by con- 
sidering this situation. Since A is friends with both B and F, the 
probability that B and F also know one another (dotted line), thereby 
completing the quadrilateral in (b), is by definition equal to the clus- 
tering coefficient. 



in the network, Eq. ( |l2| ) overestimates C2 by exactly a factor 
of 1/M, and hence our theory can be fixed by replacing m — 1 
inEq. (0)byM(l -C)(m-1). 

But now we have a problem. Calculating the mutuality M 
using Eq. ( pj| ) requires that we know the mean number of in- 
dividuals two steps away from the central individual. But this 
is precisely the quantity C2 that our calculation is supposed to 
estimate in the first place. Our entire goal here is to estimate 
C2 without having to measure it directly, which would in any 
case be quite difficult for most networks. There is however a 
solution to thisproblem. Consider the two configurations de- 
picted in Fig. pi parts (a) and (b). In (a), the ego, denoted E 
and shaded, has two friends A and B, both of whom know F, 
although F is a stranger to E. The same is true in (b), but now 
A and B are friends of one another also. For many networks 
we find that situation (a) is quite uncommon. It is rare to find 
four people arranged in a ring such that each knows two of the 
others, but none of the four knows the person opposite them 
in the ring. Situation (b) is much more common. And it turns 
out that we can estimate the frequency of occurrence of (b) 
from a knowledge of the clustering coefficient. 

Consider Fig. |p. The central actor E has a tie with A, who 
has a tie with F. How many other paths of length two are there 
from E to F? Well, if E has k\ neighbors, as before, then by 
the definition (|llj) of the clustering coefficient, A will have 
ties to C(k\ — 1 ) of them on average. The tie between actors A 
and B in the figure is an example of one such. But now A 
has ties to both B and F, and hence, using the definition of the 
clustering coefficient again, B and F will themselves have a tie 
(dotted line) with probability C. Thus there will on average be 
C 2 {k\ - 1 ) other paths of length 2 to F, or 1 + C 2 (k\ - 1 ) paths 
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in total, counting the one that runs through A. This is the aver- 
age factor by which we will over-count the number of second 
neighbors of E because of the mutuality effect. Substituting 
into Eq. (^|), we then conclude that our best estimate of is 



c 2 =M(1 -C)(k 2 -k), 
where the mutuality coefficient M is given by 



M - 



k/[\+C 2 {k-\)] 



(14) 



(15) 



Notice that both 1 — C and M tend to 1 as C becomes small, 
so that Eq. ( Jl4j ) becomes equivalent to Eq. in a network 
where there is no clustering, as we would expect. 

In essence what Eq. ( p"5| ) does is estimate the value of M in 
a network in which triangles of ties are common, but squares 
that are not composed of adjacent triangles are assumed to 
occur with frequency no greater than one would expect in a 
purely random network. 

To summarize, if we know the degree distribution and 
clustering coefficient of a network — both of which can 
be estimated from knowledge of actors' personal radius- 1 
networks — then we can estimate the number of friends of 
friends the typical actor has using Eq.®, @, or @. These 
three equations we expect to give successively more accurate 
results for C2- Because we have neglected configurations of 
the form shown in Fig. ||a and because of approximations 
made in the derivation of Eqs. (jl2|) and (|l4|), we do not ex- 
pect any of them to estimate C2 perfectly. As we will see in 
the following section however, Eq. (|l4|) provides an excellent 
guide to the value of cj_ in practice, with only a small error 
(less than ten percent in the cases we have examined). 



IV. EXAMPLE APPLICATION 

In this section we test our theory by applying it to two net- 
works for which can directly measure the mean number of 
second neighbors of a vertex and compare it with the predic- 
tions of Eqs. (|), (@, and @. 

Academic coauthorship networks are one of the best doc- 
umented classes of social networks. In these networks the 
vertices represent the authors of scholarly papers, and two 
vertices are connected by an edge if the two individuals in 
question have coauthored a paper togetherEZl With the advent 
of comprehensive electronic databases of published papers 
and preprints, large coauthorship networks can be constructed 
with good reliability and a high degree of automation. Coau- 
thorship networks are true social networks in the sense that 
two individuals who have coauthored a paper are very likely 
to be personally acquainted. (There are exceptions, particu- 
larly in fields such as high-energy physics, where author lists 
running to hundreds of names are not uncommon. We will not 
be dealing with such exceptions here, however.) 

We examine two different coauthorship networks: 

1 . A network of collaborations between 1 .5 million scieji-. 
tists in biomedicine, compiled by the present authonl^O 
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FIG. 4: Degree distributions of the two academic coauthorship net- 
works discussed in the text. Axes are logarithmic. 



from all publications appearing between 1995 and 1999 
inclusive in the Medline bibliographic database, which 
is maintained by the National Institutes of Health. 

2. A network of collaborations between a quarter of a mil- 



ded,to the author by 



I 



2012 1 



who compiled it 



lion mathematicians, kindly prov 
Jerrold Grossman and Patrick Ion, 
from data provided by the American Mathematical So- 
ciety. 



In Fig. |we show the degree distributions of these networks. 
As the figure shows, neither is narrowly distributed about its 
mean. Both in fact are almost power-law in form, with long 
tails indicating that there are a small number of individuals 
in the network with a very large number of collaborators. In 
the network of mathematicians, for instance, a plurality (about 
a third) of individuals who have collaborated at all have de- 
gree 1, i.e., have collaborated with only one other. But there 
is one individual in the network, the legendary Hungarian 
Paul ErdosJHI who collaborated with a remarkable 502 oth- 
ers. (This number is a lower bound; even though Erdos died 
in 1996, new collaborations of his are still coming to light 
through publications he coauthored that are just now appear- 
ing in print.) 

We have calculated the number of second neighbors of the 
average vertex in these networks in five different ways: us- 
ing the simple estimate discussed in the introduction, using 
the three progressively more sophisticated estimates, Eqs. (|8]), 
(12), and (|l4|), developed here, and directly by exhaustive 
measurement of the networks themselves. The results are 
summarized in Table |j| As expected, the simple method of es- 
timating C2, which assumes it to be equal to the square of the 
mean degree, gives an underestimate for both networks, by a 
factor of more than two for the mathematicians and more than 
five for the biomedical scientists. Moreover, we have been 
quite generous to the simple method in this calculation, omit- 
ting from the formulas any correction for transitivity, such as 
the lead-in factors discussed in the introduction. Including 
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estimate of ci 






network 


actors 


mean degree 


clustering 


simple 


good 


better 


best 


actual C2 


mathematics 


253339 


3.92 


0.150 


15.4 


47.9 


40.7 


33.8 


36.4 


biomedicine 


1520251 


15.53 


0.081 


241.1 


2006.0 


1843.6 


1357.6 


1254.0 



TABLE I: Summary of results for collaboration networks of mathematicians and biomedical scientists. 



such a correction gives estimates of C2 = 10.9 and 163.3 for 
the two networks. These estimates are too low by factors of 
over 3 and 7 respectively — large enough errors to be problem- 
atic in almost any application. 

By contrast, the new method does much better. The "good" 
and "better" estimates, Eqs. ^ and (|T2[), give figures of the 
same general order of magnitude as the true result, and pro- 
vide good rule-of-thumb guides to the expected value of C2- 
But the best estimate, Eq. (|l4|), making use of Eq. (jlj) to cal- 
culate the mutuality coefficient M, does better still, giving fig- 
ures for C2 which are within 8% and 9% of the known correct 
answers for the mathematics and biomedicine networks re- 
spectively. Clearly this is a big improvement over the simple 
estimate. Eq. ( |l4| ) appears to be accurate enough to give very 
useful estimates of numbers of friends of friends in real social 
networks. 



V. CONCLUSIONS 

There are a number of morals to this story. Perhaps the most 
important of them is that your friends just aren't normal. No 
one's friends are. By the very fact of being someone's friend, 
friends select themselves. Friends are by definition friendly 
people, and your circle of friends will be a biased sample of 
the population because of it. This is a relevant issue for many 
social network studies, but particularly for studies using ego- 
centered techniques such as snowball sampling. 

In this paper we have not only argued that your friends are 
unusual people, we have also shown (in a rather limited sense) 
how to accommodate their unusualness. By careful consid- 
eration of biases in sampling and correlation effects such as 
transitivity in the network, we can make accurate estimates of 
how many people your friends will be friends with. We have 
demonstrated that the resulting formulas work well for real 
social networks, taking the example of two academic coau- 
thorship networks, for which the mean number of a person's 
second neighbors in the network can be measured directly as 
well as estimated from our equations. 

It is important to note however that application of the for- 
mulas we have given requires the experimenter to measure 



certain additional parameters of the network. In particular, it 
is not enough to know only the mean number of ties an ac- 
tor has. One needs to know also the distribution of that num- 
ber. Measuring this distribution is not a trivial undertakina_aL. 
though some promising progress has been made recentlyBSIlil 
One must also find the clustering coefficient of the network, 
which requires us to measure how many pairs of friends of an 
individual are themselves friends. This may require the inclu- 
sion of additional questions in surveys as well as additional 
analysis. 

To return then to the question with which we opened this 
paper, can we estimate how many friends of friends a person 
will have on average who fall into a given group or who were 
involved in a given event? If the number involved in the event 
is e as before, and the total population is t , then the number we 
want, call it m%, is given by mi — ciejt. Thus, once we have 
C2 we can answer our question easily enough. Using figures 
appropriate for the United States and the simple estimate of 
C2 that it is equal to the square c 2 of the number of acquain- 
tances the average person has, we get C2 = 290 2 = 84100, 
t — 280 million, and ma = e/3330. As we have seen here, 
however, this probably underestimates the actual figure con- 
siderably. The real number could be a factor of five or more 
greater than this formula suggests. Unfortunately, as far as we 
know, the necessary data have not been measured for typical 
personal acquaintance networks to allow us to estimate C2 by 
the methods described here. In particular, measurements of 
the clustering coefficient are at present lacking. We encour- 
age those involved in empirical studies of these networks to 
measure these things soon. 
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